HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 #14624

deepsek · 2025-07-10T18:31:04Z

Added Matrix cores support (MFMA instructions) for MMQ kernels.
Enable stream-K for CDNA3 to work with MMQ kernels.
Removed usage of WARP_SIZE hardcoded constant in MMQ kernels.
NOTE: Thoughts on removing all uses of hardcoded const specific to only NVIDIA (like WARP_SIZE) in order to support other GPUs?

@JohannesGaessler @ggerganov
P.S. I am part of an AMD team actively working on enabling AMD feature set on llama.cpp. We would like to get on call to discuss some future PR plans for additional backends, flash attention changes, etc.

EDIT:
Update to add some performance charts for DeepSeekV3 model.

Upstream vs ROCm Fork Development

MI300X vs H100 Throughput Test

JohannesGaessler · 2025-07-10T18:47:19Z

I would be happy to get on a call with you to discuss AMD hardware support, my email address can be found on my Github page.

ggerganov · 2025-07-11T07:48:35Z

P.S. I am part of an AMD team actively working on enabling AMD feature set on llama.cpp. We would like to get on call to discuss some future PR plans for additional backends, flash attention changes, etc.

@deepsek Thanks for the contribution and for reaching out. On topics related to the CUDA backend, @JohannesGaessler is the best person to consult with. For additional backends, @slaren can provide guidelines and advice. I'll be happy to provide input on any matters as well.

I am also available for call - feel free to contact me.

Dampfinchen · 2025-07-11T12:18:25Z

Very nice to see the initiative. I assume improvements made for CDNA will also swap into the consumer side next year when UDNA releases. So this is exciting news for the future of AMD products!

IMbackK · 2025-07-12T21:21:30Z

This certainly is good news

JohannesGaessler · 2025-07-12T21:51:15Z

Sorry, I wanted to ask: @IMbackK since you've been working on AMD support, are you interested in joining the discussion?

IMbackK · 2025-07-14T16:44:52Z

Sorry, I wanted to ask: @IMbackK since you've been working on AMD support, are you interested in joining the discussion?

Yes, certainly. It would help to avoid duplication of effort. i can be reached via email at uvos.xyz user carl

deepsek and others added 14 commits May 23, 2025 19:43

Feat: Enable MFMA instr for Q4_K

68da4e5

Fix: Missed template param

79f348a

Feat: Add MFMA instr for Q6_K, remove MMQ_NWARPS

89ba8a6

Merge branch 'ggml-org:master' into amd-integration

e57e563

Merge branch 'ggml-org:master' into amd-integration

9784a51

Merge branch 'ggml-org:master' into amd-integration

dad79b3

Perf: Fix Register Spilling Q6_K - Refactor kernel, launch_bound

ff60fa9

Perf: Refactor Q4_K, reduce register pressure

e8eeb34

Perf: Throughput Increase 4k->6.9k t/s

a161900

Perf: 7.1k tokens/sec

75d386a

Perf/Feat: Throughput 8.3k tokens/sec, Add support for all quants

0215a80

Feat: Remove warnings, deprecated __AMDGCN_WAVEFRONT_SIZE

aa35feb

Merge branch 'master' into amd-integration

ba17f62

Feat: Enable stream-k for CDNA3

5ab1491

deepsek requested a review from JohannesGaessler as a code owner July 10, 2025 18:31

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 10, 2025

Fix: Remove Trailing Whitespaces

fb2fd31

deepsek added 2 commits July 14, 2025 18:34

Fix: Unused Params Warnings, CUDA Build

b55d44a

-p512: 8.4k->9.5k - Account for DataPadding for writing tile_y

ab7c007

deepsek requested a review from ngxson as a code owner July 15, 2025 16:53

github-actions bot added the devops improvements to build systems and github actions label Jul 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 #14624

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 #14624

deepsek commented Jul 10, 2025 •

edited

Loading

Uh oh!

JohannesGaessler commented Jul 10, 2025

Uh oh!

ggerganov commented Jul 11, 2025

Uh oh!

Dampfinchen commented Jul 11, 2025 •

edited

Loading

Uh oh!

IMbackK commented Jul 12, 2025

Uh oh!

JohannesGaessler commented Jul 12, 2025

Uh oh!

IMbackK commented Jul 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 #14624

Are you sure you want to change the base?

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 #14624

Conversation

deepsek commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented Jul 10, 2025

Uh oh!

ggerganov commented Jul 11, 2025

Uh oh!

Dampfinchen commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IMbackK commented Jul 12, 2025

Uh oh!

JohannesGaessler commented Jul 12, 2025

Uh oh!

IMbackK commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

deepsek commented Jul 10, 2025 •

edited

Loading

Dampfinchen commented Jul 11, 2025 •

edited

Loading

IMbackK commented Jul 14, 2025 •

edited

Loading